Journal article

SynJAC: synthetic-data-driven joint-granular adaptation and calibration for domain specific scanned document key information extraction

Y Ding, SC Han, Z Li, H Chung

Information Fusion | Elsevier BV | Published : 2026

Abstract

Visually Rich Documents (VRDs), comprising elements such as charts, tables, and paragraphs, convey complex information across diverse domains. However, extracting key information from these documents remains labour-intensive, particularly for scanned formats with inconsistent layouts and domain-specific requirements. Despite advances in pretrained models for VRD understanding, their dependence on large annotated datasets for fine-tuning hinders scalability. This paper proposes SynJAC (Synthetic-data-driven Joint-granular Adaptation and Calibration), a method for key information extraction in scanned documents. SynJAC leverages synthetic, machine-generated data for domain adaptation and emplo..

View full abstract

University of Melbourne Researchers